Authors: Joanna Działo 148260, Wojciech Majewski 148253
The dataset we chose is, as per suggestion, Caltech-101.
PATH = "/content/data/101_ObjectCategories"
We put all the imports here, to avoid redundancy:
import os
import numpy as np
import cv2
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import confusion_matrix
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from keras.losses import CategoricalCrossentropy
from keras import datasets, layers, models
from keras.callbacks import EarlyStopping
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, InputLayer
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
import itertools
from collections import Counter
import random
from sklearn.metrics import classification_report
! DISCLAIMER ! These functions work on Google Colab, we do not guarantee that they would work on a regular, windows/mac/linux based system. In this part we also remove two folders from the dataset, but we elaborate on that further in the summary in Part1.7)
!wget -O caltech-101.zip https://data.caltech.edu/records/mzrjq-6wc02/files/caltech-101.zip
!unzip caltech-101.zip -d caltech
!mkdir data
!tar -xvf "/content/caltech/caltech-101/101_ObjectCategories.tar.gz" -C "/content/data"
!rm -rf caltech-101.zip
!rm -rf caltech
!rm -rf /content/data/101_ObjectCategories/Faces_easy
!rm -rf /content/data/101_ObjectCategories/BACKGROUND_Google
We implemented some useful functions for displaying images. The first one takes one image and the second takes 25, with or without labels depending on the users humour.
def imshow(image, label = None):
plt.imshow(cv2.cvtColor(image.astype('uint8'), cv2.COLOR_BGR2RGB), cmap=plt.cm.binary)
plt.xticks([])
plt.yticks([])
plt.grid(False)
if label == None:
plt.show()
else:
plt.xlabel(label)
plt.show()
def display_images(images, labels = None):
fig = plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(cv2.cvtColor(images[i].astype('uint8'), cv2.COLOR_BGR2RGB), cmap=plt.cm.binary)
if labels != None:
plt.xlabel(labels[i])
plt.show()
Here we set the image size which is used in resizing and setting the input layer of the NN.
img_size = 64
This function loads images of one category. The limit parameter, if used, limts the number of images that are being loaded
def load_images(path, category, limit = None):
category_path = os.path.join(path, category)
images_paths = [os.path.join(category_path,img) for img in os.listdir(category_path)]
images = [cv2.resize(cv2.imread(f, 1), (img_size, img_size)) for f in images_paths]
labels = [category for i in range(0, len(os.listdir(category_path)))]
if limit == None:
return images, labels
else:
limit_norm = min(limit, len(images))
return images[:limit_norm], labels[:limit_norm]
This function creates the dataset from a given path - it takes two parameters - categories and limit. Categories is user-induced list of categories, if left blank it takes all categories into consideration. limit is used to limit the number of images per class which helps avoid having too much or too little representation.
def create_dataset(path, categories = None, limit = None):
if categories == None:
categories = os.listdir(path)
X = []
y = []
for category in categories:
images, labels = load_images(path, category, limit)
X.extend(images)
y.extend(labels)
X = np.asarray(X)
y = np.asarray(y)
return X, y
Step verification:
faces_images, faces_labels = load_images(PATH, "Faces")
imshow(faces_images[0], faces_labels[0])
print(faces_images[0].shape)
display_images(faces_images[:25], faces_labels[:25])
X_all, y_all = create_dataset(PATH)
Step verification:
imshow(X_all[0], y_all[0])
print(X_all[0].shape)
imshow(X_all[200], y_all[200])
print(X_all[200].shape)
imshow(X_all[1000], y_all[1000])
print(X_all[1000].shape)
Note: Some of these operations can be performed during step 1). If that's the case, don't do them again here.
Unify the images:
Step verification:
Check what the image you selected in step 1) looks like now.
All images are rgb. That's because we use opencv to load them which loads all jpgs as 3 channeled, even when importing a grayscale image it will still make it three channeled.
Here is our function to standardize the images (formula from the task description)
def standardize_dataset(dataset):
mean = np.mean(dataset, axis=(0,1,2))
std = np.sqrt(((dataset - mean)**2).mean(axis=(0,1,2)))
result = (dataset - mean) / std
return result
standardized_X = standardize_dataset(X_all)
Step verification:
imshow(X_all[0])
imshow(standardized_X[0])
Let's not waste RAM though (fun fact, managing RAM proved to be the most difficult part of this project, sadly our personal gpu-free computers took ages for the model to train, so we had to stick to Colab)
del standardized_X
del X_all
del y_all
Note: After selecting a subset of classes, check what images contain these classes - are they appropriate for the classification problem? "BACKGROUND_Google" is probably not the best choice :)
Step verification:
We have chosen to use 15 classes (following the idea of 'start slow and build up'). The classes choice was based on counting the number of images per class and choosing the 15 most populated ones (limit of representatives = 100 to avoid imbalanced distribution)
def get_n_categories(n = 15):
categories = os.listdir(PATH)
classes = dict()
for category in categories:
category_path = os.path.join(PATH, category)
images_paths = os.listdir(category_path)
classes[category] = len(images_paths)
sorted_classes = sorted(classes.items(), key=lambda x:x[1], reverse=True)
limit = sorted_classes[n][1]
return [x[0] for x in sorted_classes[:n]], limit
new_categories, limit = get_n_categories()
new_categories
X_15, y_15 = create_dataset(PATH, new_categories, limit = limit)
X_15.shape
X_15 = standardize_dataset(X_15)
X_15.shape
Now, would train test split from sklearn work? Yes, but we already had this function implemented in another project, and it would be a waste if we hadn't used it here. So here's our own function which creates the train, test and validation sets in a stratified manner and with user-based percentage of train/test/val data.
def distribute_train_test_val(X, y, test_split = 0.15, val_split = 0.15):
classes = list(Counter(y).keys())
number = list(Counter(y).values())
n = len(classes)
X_train, y_train, X_test, y_test, X_val, y_val = [],[],[],[],[],[]
beg = 0
for i in range(n):
test_size = int(number[i] * test_split)
val_size = int(number[i] * val_split)
train_size = number[i] - test_size - val_size
set_x = X[beg : beg + test_size]
random.shuffle(set_x)
set_y = y[beg : beg + test_size]
X_test.extend(set_x)
y_test.extend(set_y)
beg += test_size
set_x = X[beg : beg + val_size]
random.shuffle(set_x)
set_y = y[beg : beg + val_size]
X_val.extend(set_x)
y_val.extend(set_y)
beg += val_size
set_x = X[beg : beg + train_size]
random.shuffle(set_x)
set_y = y[beg : beg + train_size]
X_train.extend(set_x)
y_train.extend(set_y)
beg += train_size
X_train = np.asarray(X_train)
y_train = np.asarray(y_train)
X_val = np.asarray(X_val)
y_val = np.asarray(y_val)
X_test = np.asarray(X_test)
y_test = np.asarray(y_test)
return X_train, y_train, X_test, y_test, X_val, y_val
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X_15, y_15)
Step verification of the third point had to be done here, as OHE the data made it difficult
classes_train = list(Counter(y_train).keys())
number_train = list(Counter(y_train).values())
for i in range(len(classes_train)):
print(classes_train[i], number_train[i])
classes_test = list(Counter(y_test).keys())
number_test = list(Counter(y_test).values())
for i in range(len(classes_train)):
print(classes_test[i], number_test[i])
One-hot encoding the y's:
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
y_val = pd.get_dummies(y_val)
y_train.head()
mapping = y_train.columns
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
y_val = np.asarray(y_val)
Step verification:
print(X_train.shape)
print(X_test.shape)
print(X_val.shape)
print(y_train.shape)
print(y_test.shape)
print(y_val.shape)
imshow(X_train[0])
y_train head showed that the first image is classified as airplanes, which is true.
Suggestions (you should probably try different settings and choose the best one):
Note: You can add Activation as a separate layer or as activation='relu' parameter in Conv2D
Step verification:
Compile the model with 'adam' optimizer, the Categorical Crossentropy as the loss function, and measure the accuracy value.
Here's our model (it's defined inside a function for further processing in part 2). We are very naive and have done everything as suggested (3 convolutional blocks etc. the description of our model is basically in the suggestions. Step verification is done inside the function.
input_shape = (img_size, img_size, 3)
def create_model(num_categories, conv_neurons = [32, 32, 32], dropout = [0.3, 0.3, 0.3], dense_neurons = [128, 200], activations = ['relu', 'relu', 'relu', 'relu', 'relu'], normalize_flag = True):
model = Sequential()
model.add(InputLayer(input_shape=input_shape))
model.add(Conv2D(conv_neurons[0], kernel_size=(3, 3), padding='same', activation=activations[0]))
if normalize_flag:
model.add(BatchNormalization())
model.add(Dropout(dropout[0]))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(conv_neurons[1], kernel_size=(3, 3), padding='same', activation=activations[1]))
if normalize_flag:
model.add(BatchNormalization())
model.add(Dropout(dropout[1]))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(conv_neurons[2], kernel_size=(3, 3), padding='same', activation=activations[2]))
if normalize_flag:
model.add(BatchNormalization())
model.add(Dropout(dropout[2]))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(dense_neurons[0], activation=activations[3]))
model.add(Dense(dense_neurons[1], activation=activations[4]))
model.add(Dense(num_categories, activation='softmax'))
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
return model
model = create_model(15)
model.summary()
Suggested hyperparameters:
Step verification:
After completing the learning process, show:
Note: Functions to display learning curves and confusion matrices based on the model will be useful in the Part 2.
Functions for diplaying learning curves, CM and CR:
def plot_accuracy(history):
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
def plot_loss(history):
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
def get_predictions(model, X_test, y_test, mapping):
test_loss = model.evaluate(X_test, y_test)
predictions = model.predict(X_test)
pred_labels = np.argmax(predictions, axis = 1)
y_labels = np.asarray(y_test)
y_labels = np.argmax(y_labels, axis = 1)
labels = []
for i in range(25):
labels.append(str(mapping[pred_labels[i]]) + " " + str(mapping[y_labels[i]]))
X = np.asarray(X_test[:25])
display_images(X, labels)
return y_labels, pred_labels
def display_confusion_matrix(y_labels, pred_labels, mapping):
cm = confusion_matrix(y_labels, pred_labels)
ax = plt.axes()
sns.heatmap(cm, annot=True,
annot_kws={"size": 5},
xticklabels=mapping,
yticklabels=mapping, ax = ax)
ax.set_title('Confusion matrix')
plt.show()
cr = classification_report(y_labels, pred_labels, target_names=mapping)
return cr
def plot_all(history, model, X_test, y_test, mapping):
plot_accuracy(history)
plot_loss(history)
y_labels, pred_labels = get_predictions(model, X_test, y_test, mapping)
return display_confusion_matrix(y_labels, pred_labels, mapping)
Here's an interesting addition that differs from the suggestions - we decided to use and ImageDataGenerator (our NN was overfitting and after scouring multiple stackoverflow posts, we found one that said this could be a solution to our problems - now it still overfits, but not as fast as it used to before).
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
)
We have decided to use the recommended formulas for steps_per_epoch and validation_steps, set the patience to 10 and told EarlyStopping to restore to the best weights achieved (all to avoid our overfitting problem).
history = model.fit(datagen.flow(X_train, y_train, batch_size=32, shuffle=True),
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
cr = plot_all(history, model, X_test, y_test, mapping)
print(cr)
Step verification:
Checking the operation of both functions by:
!touch model0.json
!touch weights0.h5
def save_model(model, model_path, weights_path):
model_json = model.to_json()
with open(model_path, "w") as json_file:
json_file.write(model_json)
model.save_weights(weights_path)
def load_model(model_path, weights_path):
with open((model_path), "r") as json_file:
model = tf.keras.models.model_from_json(json_file.read())
model.load_weights(weights_path)
return model
Step verification:
save_model(model, '/content/model0.json', '/content/weights0.h5')
model_json = load_model('/content/model0.json', '/content/weights0.h5')
model_json.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
test_loss = model_json.evaluate(X_test, y_test)
test_loss
y_labels, pred_labels = get_predictions(model_json, X_test, y_test, mapping)
display_confusion_matrix(y_labels, pred_labels, mapping)
Describe your observations on the tasks performed. Supporting questions:
Step verification:
Use Markdowns to describe your conclusions or put them into the report.
The summary will be performed on our best result, allow me to load in the model
model_json = load_model('/content/model3.json', '/content/weights3.h5')
model_json.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
test_loss = model_json.evaluate(X_test, y_test)
test_loss
y_labels, pred_labels = get_predictions(model_json, X_test, y_test, mapping)
cr = display_confusion_matrix(y_labels, pred_labels, mapping)
print(cr)
On the train set the model achieved loss: 0.2850 accuracy: 0.9098, on the validation set it was loss: 2.0039 accuracy: 0.6222, while on the independent test loss: 2.0159 accuracy: 0.6778. Overall the result isn't bad, but it's not great either. We will experiment with the parameters to see if we can make it better in the later part of the report.
The main modifications we made are related to the dataset - Caltech-101. It is sadly a dataset that is quite unbalanced with some redundant folders. We have removed such folders (Faces_easy and BACKGROUND GOOGLE) to avoid feeding our model data which is unimportant (Faces_easy and Faces are basically the same thing, BACKGROUND GOOGLE had no pattern to it). When it comes to balancing out, we have set a limit when it comes to using a class (100 representatives) and have used ImageDataGenerator from the keras library.
The model is clearly overfitting. We've done everything we could think of to stop the process but our best attempts only slowed it. If we had more data we would feed the model, but for now we've done our best. We used BatchNormalization, Dropout, MaxPooling to conquer this problem. We also put saving the best weights in EarlyStopping.
The categories that were classified correctly 100% of the time are Faces, menorah, airplanes and car_side. Then bonsai was classified correctly 92% of time, while watch achieved a 83%. There were no classes which were classified incorrectly 100%. The lowest classification rate was achieved by Leopard - a whooping 25%. It was misclassified as and airplane 5 times.A conclusion we've drawn from the confusion matrix is that we suspect that initially all images were classified as airplanes - since that's were a lot of classes were mistakenly put, some completely unrelated like Leopards and Grand_piano.
Firstly, we think that the number of classes is too high - since the limit is set based on the lowest number of representatives for the least populated class, here we got only 80 images per class. We hope that tuning dropout will provide some improvement. We also wanna try some different preprocessing approaches - maybe standardizing just isn't the best option for our dataset and therefore using a different function will yield better result (or maybe not using one at all).
The purpose of the Part 2 is to examine the dependence of the quality of the resulting model on factors such as hyperparameters, model structure, number of training data, number of decision classes, etc. Below is a suggestion of simple tasks - most of them consist of a simple experiment to compare several models that differ in some detail.
You don't have to complete all of the tasks, you can choose the issues that seem interesting to you. You can also suggest your own ideas for the experiment definition (in this case - consult with the teacher). As mentioned above, each task is worth $20\%$ of the project grade (the value of the last task is doubled, because it requires much more effort). Assuming you've completed the Part 1, as you may have already calculated:
Of course, you can complete more than 3 tasks, then the extra points will ensure that even if one of the tasks is not completely correct, you still have an opportunity to get $100\%$.
In addition to implementing the code and running experiments to carry out a given task, you must draw conclusions about the results. In that case, it may be helpful to collect the following information about the compared models:
In general, there are many possibilities to visualize data, to compare models with each other and draw conclusions from it. Only some suggestions are visible above, it is up to you what and how you decide to show in your conclusions and observations.
Each of the following tasks assumes the implementation of an appropriate code that will perform a given experiment - and therefore it will probably modify the dataset or model (its structure or hyperparameters). After implementing the experiment, write:
After obtaining the results for the compared models, datasets or any other aspect, describe your observations and conclusions about the experiment. Observations may in particular concern:
In general, it is worth describing everything that you find interesting in the context of the specific task. The description of the conclusions should be included in Markdown(s) (or in the report if you want to prepare one).
2 classes
num_of_classes = 2
new_categories, limit = get_n_categories(num_of_classes)
if limit > 250:
limit = 250
X, y = create_dataset(PATH, new_categories, limit = limit)
X = standardize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
y_val = pd.get_dummies(y_test)
mapping = y_train.columns
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
y_val = np.asarray(y_test)
model = create_model(num_of_classes)
history = model.fit(datagen.flow(X_train, y_train, batch_size=32, shuffle=True),
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
y_labels, pred_labels = get_predictions(model, X_test, y_test, mapping)
cr = display_confusion_matrix(y_labels, pred_labels, mapping)
5 classes
num_of_classes = 5
new_categories, limit = get_n_categories(num_of_classes)
if limit > 250:
limit = 250
X, y = create_dataset(PATH, new_categories, limit = limit)
X = standardize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
y_val = pd.get_dummies(y_test)
mapping = y_train.columns
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
y_val = np.asarray(y_test)
model = create_model(num_of_classes)
history = model.fit(datagen.flow(X_train, y_train, batch_size=32, shuffle=True),
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
y_labels, pred_labels = get_predictions(model, X_test, y_test, mapping)
cr = display_confusion_matrix(y_labels, pred_labels, mapping)
10 classes
num_of_classes = 10
new_categories, limit = get_n_categories(num_of_classes)
if limit > 250:
limit = 250
X, y = create_dataset(PATH, new_categories, limit = limit)
X = standardize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
y_val = pd.get_dummies(y_test)
mapping = y_train.columns
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
y_val = np.asarray(y_test)
model = create_model(num_of_classes)
history = model.fit(datagen.flow(X_train, y_train, batch_size=32, shuffle=True),
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
y_labels, pred_labels = get_predictions(model, X_test, y_test, mapping)
cr = display_confusion_matrix(y_labels, pred_labels, mapping)
20 classes
num_of_classes = 20
new_categories, limit = get_n_categories(num_of_classes)
if limit > 250:
limit = 250
X, y = create_dataset(PATH, new_categories, limit = limit)
X = standardize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
y_val = pd.get_dummies(y_test)
mapping = y_train.columns
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
y_val = np.asarray(y_test)
model = create_model(num_of_classes)
history = model.fit(datagen.flow(X_train, y_train, batch_size=32, shuffle=True),
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
y_labels, pred_labels = get_predictions(model, X_test, y_test, mapping)
cr = display_confusion_matrix(y_labels, pred_labels, mapping)
50 classes
num_of_classes = 50
new_categories, limit = get_n_categories(num_of_classes)
if limit > 250:
limit = 250
X, y = create_dataset(PATH, new_categories, limit = limit)
X = standardize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
y_val = pd.get_dummies(y_test)
mapping = y_train.columns
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
y_val = np.asarray(y_test)
model = create_model(num_of_classes)
history = model.fit(datagen.flow(X_train, y_train, batch_size=32, shuffle=True),
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=20,
restore_best_weights=True,
verbose = 1
)
])
y_labels, pred_labels = get_predictions(model, X_test, y_test, mapping)
cr = display_confusion_matrix(y_labels, pred_labels, mapping)
| Loss | Accuracy | |
|---|---|---|
| 2 classes | 0.39 | 0.95 |
| 5 classes | 0.97 | 0.76 |
| 10 classes | 1.08 | 0.81 |
| 20 classes | 2.91 | 0.45 |
| 50 classes | 9.19 | 0.25 |
Well, the model got much better accuracy scores when there were less classes to predict, as we hoped. The results dissapointed us a bit though, as we have found that it didn't improve quite as much as we hoped.
The conclusion we have drawn from this experiment is something that we already were suspecting - the model tends to assign all elements to one class (while also having a class that it very rarely assigns to, but only in case of more than 5 classes).
There's also the problem that it learns randomly and converges very fast - we thought that maybe if the patience was bigger this problem would be solved, but we have found that is not the case - we have increased the patience in case of 50 classes because it made sense.
new_categories, limit = get_n_categories(15)
X, y = create_dataset(PATH, new_categories, limit = limit)
X = standardize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
y_val = pd.get_dummies(y_test)
mapping = y_train.columns
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
y_val = np.asarray(y_test)
drop_values = [0.1, 0.2, 0.3, 0.4, 0.5]
all_dropouts = list(itertools.combinations_with_replacement(drop_values, 3))
accu_loss_dropout = []
for dropout in all_dropouts:
model = create_model(15, dropout = list(dropout))
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
results = model.evaluate(X_test, y_test, batch_size=32)
accu_loss_dropout.append(results)
bar_dropouts_accu = dict()
bar_dropouts_loss = dict()
for i in range(len(accu_loss_dropout)):
bar_dropouts_accu[all_dropouts[i]] = accu_loss_dropout[i][1]
bar_dropouts_loss[all_dropouts[i]] = accu_loss_dropout[i][0]
We encountered an error, cause matplotlib only took into account displaying 32 elements on a bar chart, and we have 35
def splitlist(input_list):
n = len(input_list)//2
first_half = input_list[:n]
sec_half = input_list[n:]
return first_half,sec_half
x1, x2 = splitlist(range(0, len(all_dropouts)))
y1_a, y2_a = splitlist(list(bar_dropouts_accu.values()))
y1_l, y2_l = splitlist(list(bar_dropouts_loss.values()))
fig = plt.figure(figsize = (20, 10))
X_axis = np.arange(len(x1))
plt.subplot(1, 2, 1)
plt.bar(X_axis - 0.2, y1_a, color ='maroon', width = 0.4, label="accuracy")
plt.bar(X_axis + 0.2, y1_l, color ='green', width = 0.4, label="loss")
plt.xlabel("Dropout values")
plt.legend(loc="upper left")
plt.ylim(0, 3)
X_axis = np.arange(len(x2))
plt.subplot(1, 2, 2)
plt.bar(X_axis - 0.2, y2_a, color ='maroon', width = 0.4, label="accuracy")
plt.bar(X_axis + 0.2, y2_l, color ='green', width = 0.4, label="loss")
plt.xlabel("Dropout values")
plt.legend(loc="upper left")
plt.ylim(0, 3)
plt.tight_layout()
plt.show()
Looking at the plots, the best accuracy to loss ratio was achieved by dropout no. 8. Let's peak at it.
all_dropouts[8]
This is a nice conclusion and it might help with overfitting - low values in the beginning allow the model to learn sth, and then comes the 0.5 dropout to cut out all the unimportant parts.
best_dropout = all_dropouts[8]
We didn't use ImageGenerator here to make the test more accurate (less random and more reliable).
accu_loss_batch = []
for flag in [True, False]:
model = create_model(15, normalize_flag = flag, dropout = best_dropout)
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
results = model.evaluate(X_test, y_test, batch_size=32)
accu_loss_batch.append(results)
accu_loss_batch
We didn't use ImageGenerator here as well, for the same reasons as before. What's surprising is that BatchNormalization actually seemed to worsen the models state. That was entirely unexpected, but we kept it in mind and moved on to the next task.
def substracting_dataset(dataset):
mean = np.mean(dataset, axis=(0,1,2))
result = dataset - mean
return result
def normalize_dataset(dataset):
mini = np.min(dataset, axis=(0,1,2))
maxi = np.max(dataset, axis=(0,1,2))
result = (dataset - mini) / (maxi - mini)
return result
new_categories, limit = get_n_categories(10)
X, y = create_dataset(PATH, new_categories, limit = limit)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train, y_test, y_val = pd.get_dummies(y_train), pd.get_dummies(y_test), pd.get_dummies(y_val)
mapping = y_train.columns
y_train, y_test, y_val = np.asarray(y_train), np.asarray(y_test), np.asarray(y_val)
model = create_model(10, normalize_flag = False, dropout = best_dropout)
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
cr = plot_all(history, model, X_test, y_test, mapping)
print(cr)
X_substracted = substracting_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X_substracted, y)
y_train, y_test, y_val = pd.get_dummies(y_train), pd.get_dummies(y_test), pd.get_dummies(y_val)
y_train, y_test, y_val = np.asarray(y_train), np.asarray(y_test), np.asarray(y_val)
model = create_model(10, normalize_flag = False, dropout = best_dropout)
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
cr = plot_all(history, model, X_test, y_test, mapping)
print(cr)
X_normalized = normalize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X_normalized, y)
y_train, y_test, y_val = pd.get_dummies(y_train), pd.get_dummies(y_test), pd.get_dummies(y_val)
y_train, y_test, y_val = np.asarray(y_train), np.asarray(y_test), np.asarray(y_val)
model = create_model(10, normalize_flag = False, dropout = best_dropout)
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
cr = plot_all(history, model, X_test, y_test, mapping)
print(cr)
X_standardized = standardize_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X_standardized, y)
y_train, y_test, y_val = pd.get_dummies(y_train), pd.get_dummies(y_test), pd.get_dummies(y_val)
y_train, y_test, y_val = np.asarray(y_train), np.asarray(y_test), np.asarray(y_val)
model = create_model(10, normalize_flag = False, dropout = best_dropout)
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
cr = plot_all(history, model, X_test, y_test, mapping)
print(cr)
These results were quite funny to us - it said on multiple websites that standardizing and normalizing a dataset are the best ways to prevent overfitting - well that was not the case here. Our model seemed to perform best on just substracting the mean (81% test accu), then it did really well on raw data (73% test accu) and it performed mediocrely on normalized data (70% test accu) and the worst on standardized data (68%).
X_substracted = substracting_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X_substracted, y)
y_train, y_test, y_val = pd.get_dummies(y_train), pd.get_dummies(y_test), pd.get_dummies(y_val)
y_train, y_test, y_val = np.asarray(y_train), np.asarray(y_test), np.asarray(y_val)
activation = ['sigmoid', 'tanh', 'relu']
all_activations = list(itertools.combinations_with_replacement(activation, 5))
accu_loss_activations = []
for activ in all_activations:
model = create_model(10, normalize_flag = False, dropout = best_dropout, activations = activ)
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=10,
restore_best_weights=True,
verbose = 1
)
])
results = model.evaluate(X_test, y_test, batch_size=32)
accu_loss_activations.append(results)
bar_activations_accu = dict()
bar_activations_loss = dict()
for i in range(len(accu_loss_activations)):
bar_activations_accu[all_activations[i]] = accu_loss_activations[i][1]
bar_activations_loss[all_activations[i]] = accu_loss_activations[i][0]
x = range(0, len(all_activations))
y_a = list(bar_activations_accu.values())
y_l = list(bar_activations_loss.values())
fig = plt.figure(figsize = (10, 10))
X_axis = np.arange(len(x))
plt.bar(X_axis - 0.2, y_a, color ='maroon', width = 0.4, label="accuracy")
plt.bar(X_axis + 0.2, y_l, color ='green', width = 0.4, label="loss")
plt.xlabel("Dropout values")
plt.legend(loc="upper left")
plt.show()
We spotted three candidates for the optimal accuracy / loss value. Let's see them.
print(all_activations[0], accu_loss_activations[0])
print(all_activations[11], accu_loss_activations[11])
print(all_activations[20], accu_loss_activations[20])
best_activation = all_activations[11]
Ok, the best combination seems to be the middle one. It achieved 75% accuracy nad 0.91 loss for the training set.
Here is our last compilation of the model, with all of the tweaks applied - let's see how it outperforms the first one!
new_categories, limit = get_n_categories(15)
X, y = create_dataset(PATH, new_categories, limit = limit)
X = substracting_dataset(X)
X_train, y_train, X_test, y_test, X_val, y_val = distribute_train_test_val(X, y)
y_train, y_test, y_val = pd.get_dummies(y_train), pd.get_dummies(y_test), pd.get_dummies(y_val)
mapping = y_train.columns
y_train, y_test, y_val = np.asarray(y_train), np.asarray(y_test), np.asarray(y_val)
model = create_model(15, normalize_flag = False, dropout = best_dropout, activations = best_activation)
history = model.fit(X_train, y_train,
batch_size=32,
epochs=250,
steps_per_epoch=X_train.shape[0]//32,
validation_data=(X_val, y_val),
validation_steps=X_val.shape[0]//32,
verbose = 0,
callbacks=[
EarlyStopping(
monitor="val_accuracy",
patience=20,
restore_best_weights=True,
verbose=0
)
])
cr = plot_all(history, model, X_test, y_test, mapping)
print(cr)
This is our final result- it's still not perfect, but we think that the overfitting problem has been solved in the best way possible. The loss for this model is much lower than the previous ones, even though the accuracy is mediocre - the dataset split is random still which makes the results appear differently. That's why we loaded our previous best model to compare.
model_json = load_model('/content/model3.json', '/content/weights3.h5')
model_json.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
test_loss = model_json.evaluate(X_test, y_test)
test_loss
The models are sadly not comparable, mostly because the first one was trained on a standardized, not substracted data.
!pip freeze > requirements.txt
!jupyter nbconvert --to html /content/DL_2_148260_148253.ipynb